{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0d7688a7",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/output_parsing/openai_pydantic_program.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "530c973e-916d-4c9e-9365-e2d5306d7e3d",
   "metadata": {},
   "source": [
    "# LLM Pydantic Program"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18461ba1-6978-4b5b-861e-6dceec36857b",
   "metadata": {},
   "source": [
    "This guide shows you how to generate structured data with our `LLMTextCompletionProgram`. Given an LLM as well as an output Pydantic class, generate a structured Pydantic object.\n",
    "\n",
    "In terms of the target object, you can choose to directly specify `output_cls`, or specify a `PydanticOutputParser` or any other BaseOutputParser that generates a Pydantic object.\n",
    "\n",
    "in the examples below, we show you different ways of extracting into the `Album` object (which can contain a list of Song objects)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fcefc79-68b4-4481-b1ef-a902fc12e4c8",
   "metadata": {},
   "source": [
    "## Extract into `Album` class\n",
    "\n",
    "This is a simple example of parsing an output into an `Album` schema, which can contain multiple songs.\n",
    "\n",
    "Just pass `Album` into the `output_cls` property on initialization of the `LLMTextCompletionProgram`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "81e5dde0",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2833cea",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7a83b49-5c34-45d5-8cf4-62f348fb1299",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import BaseModel\n",
    "from typing import List\n",
    "\n",
    "from llama_index.core.program import LLMTextCompletionProgram"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0563f1ba-8086-4dcc-ba35-bfda31c45ae4",
   "metadata": {},
   "source": [
    "Define output schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42053ea8-2580-4639-9dcf-566e8427c44e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Song(BaseModel):\n",
    "    \"\"\"Data model for a song.\"\"\"\n",
    "\n",
    "    title: str\n",
    "    length_seconds: int\n",
    "\n",
    "\n",
    "class Album(BaseModel):\n",
    "    \"\"\"Data model for an album.\"\"\"\n",
    "\n",
    "    name: str\n",
    "    artist: str\n",
    "    songs: List[Song]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4afff44e-a746-4b9f-85a9-72058bcdd29f",
   "metadata": {},
   "source": [
    "Define LLM pydantic program"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64145300-0f02-474e-8d61-74490fe4702b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.program import LLMTextCompletionProgram"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe756697-c299-4f9a-a108-944b6693f824",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\"\"\"\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7be01dc-433e-4485-bab0-36a04c3afbcb",
   "metadata": {},
   "source": [
    "Run program to get structured output.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25d02228-2907-4810-932e-83ec9fc71f6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(movie_name=\"The Shining\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27ec0777-28d5-494b-b419-daf6bce2b20e",
   "metadata": {},
   "source": [
    "The output is a valid Pydantic object that we can then use to call functions/APIs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e51bcf4-e7df-47b9-b380-8e5b900a31e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Album(name='The Overlook', artist='Jack Torrance', songs=[Song(title='Redrum', length_seconds=240), Song(title=\"Here's Johnny\", length_seconds=180), Song(title='Room 237', length_seconds=300), Song(title='All Work and No Play', length_seconds=210), Song(title='The Maze', length_seconds=270)])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3c3a009-2bfd-4d30-a982-133e4ff41c62",
   "metadata": {},
   "source": [
    "### Initialize with Pydantic Output Parser\n",
    "\n",
    "The above is equivalent to defining a Pydantic output parser and passing that in instead of the `output_cls` directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5741f8e9-1b1f-4f57-b6dd-9c8681383686",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.output_parsers import PydanticOutputParser\n",
    "\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_parser=PydanticOutputParser(output_cls=Album),\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66c4fe3c-f60b-4b7e-bf1a-5f3c33397d7e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Album(name='The Fellowship of the Ring', artist='Middle-earth Ensemble', songs=[Song(title='The Shire', length_seconds=240), Song(title='Concerning Hobbits', length_seconds=180), Song(title='The Ring Goes South', length_seconds=300), Song(title='A Knife in the Dark', length_seconds=270), Song(title='Flight to the Ford', length_seconds=210), Song(title='Many Meetings', length_seconds=240), Song(title='The Council of Elrond', length_seconds=330), Song(title='The Great Eye', length_seconds=180), Song(title='The Breaking of the Fellowship', length_seconds=360)])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output = program(movie_name=\"Lord of the Rings\")\n",
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f72755ed-c460-4615-bf65-f3c288391618",
   "metadata": {},
   "source": [
    "## Define a Custom Output Parser\n",
    "\n",
    "Sometimes you may want to parse an output your own way into a JSON object. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b72fb2a3-0501-4671-8aa0-7e5ad0e3b225",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.output_parsers import ChainableOutputParser\n",
    "\n",
    "\n",
    "class CustomAlbumOutputParser(ChainableOutputParser):\n",
    "    \"\"\"Custom Album output parser.\n",
    "\n",
    "    Assume first line is name and artist.\n",
    "\n",
    "    Assume each subsequent line is the song.\n",
    "\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, verbose: bool = False):\n",
    "        self.verbose = verbose\n",
    "\n",
    "    def parse(self, output: str) -> Album:\n",
    "        \"\"\"Parse output.\"\"\"\n",
    "        if self.verbose:\n",
    "            print(f\"> Raw output: {output}\")\n",
    "        lines = output.split(\"\\n\")\n",
    "        name, artist = lines[0].split(\",\")\n",
    "        songs = []\n",
    "        for i in range(1, len(lines)):\n",
    "            title, length_seconds = lines[i].split(\",\")\n",
    "            songs.append(Song(title=title, length_seconds=length_seconds))\n",
    "\n",
    "        return Album(name=name, artist=artist, songs=songs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa128fd1-93ca-4983-b0c8-9190d4dc5162",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\n",
    "Return answer in following format.\n",
    "The first line is:\n",
    "<album_name>, <album_artist>\n",
    "Every subsequent line is a song with format:\n",
    "<song_title>, <song_length_seconds>\n",
    "\n",
    "\"\"\"\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_parser=CustomAlbumOutputParser(verbose=True),\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51115206-b781-4b66-965d-ebe3765504c8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Raw output: Gotham's Reckoning, The Dark Knight\n",
      "A Dark Knight Rises, 240\n",
      "The Joker's Symphony, 180\n",
      "Harvey Dent's Lament, 210\n",
      "Gotham's Guardian, 195\n",
      "The Batmobile Chase, 225\n",
      "The Dark Knight's Theme, 150\n",
      "The Joker's Mind Games, 180\n",
      "Rachel's Tragedy, 210\n",
      "Gotham's Last Stand, 240\n",
      "The Dark Knight's Triumph, 180\n"
     ]
    }
   ],
   "source": [
    "output = program(movie_name=\"The Dark Knight\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cca8432e-a6cb-4fc7-8120-0b7b84b0e755",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Album(name=\"Gotham's Reckoning\", artist=' The Dark Knight', songs=[Song(title='A Dark Knight Rises', length_seconds=240), Song(title=\"The Joker's Symphony\", length_seconds=180), Song(title=\"Harvey Dent's Lament\", length_seconds=210), Song(title=\"Gotham's Guardian\", length_seconds=195), Song(title='The Batmobile Chase', length_seconds=225), Song(title=\"The Dark Knight's Theme\", length_seconds=150), Song(title=\"The Joker's Mind Games\", length_seconds=180), Song(title=\"Rachel's Tragedy\", length_seconds=210), Song(title=\"Gotham's Last Stand\", length_seconds=240), Song(title=\"The Dark Knight's Triumph\", length_seconds=180)])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
